The Sequential GMM: A Gaussian Mixture Model Based Speaker Verification System that Captures Sequential Information

نویسنده

  • Stephen James Stafford
چکیده

1 Introduction This report presents a novel speaker verification system that generates a new feature set that captures long duration speaker identifying characteristics while taking advantage of the well-established and well-studied Gaussian Mixture Model system (GMM). Much of the innovation in the system is contained in the intelligent exploitation of traditional cepstral features such that temporal aspects of speech, which are otherwise disregarded in traditional GMM frameworks, can be explicitly modeled. The system consists of a collection of independent GMMs, one for each phoneme, built on these long duration feature vectors. The outputs of these GMMs are then combined at the score level using a neural network. Despite using traditional tools with respect to the GMM and the front-end feature extraction, combining this system with a run-of-the-mill GMM system dramatically reduced both the equal error rate (EER) and the minimum value of the decision cost function (DCF) on a standard speaker verification test set, in comparison to the GMM system alone. This improvement indicates that the long duration features are capturing speaker characterizing information that the regular GMM ignores. The min DCF fell by nearly 65% and the EER fell by approximately 36%. Moreover, the new system's performance, when operating in isolation, approached that of the state-of-the-art GMM. Speaker recognition is a task that is familiar to everyone. When answering the telephone, people often know immediately who is on the other end of the line. Unfortunately, speaker recognition is not such a simple task for computers. Part of the problem is that it is difficult for humans to determine what characteristics they use in identifying speakers. Perhaps they recognize a phrase the person commonly uses or maybe just the way the person laughs. Human based speaker recognition can be studied, and has been to some extent [1]. However, perhaps humans are not the optimal system; perhaps machines can do much better. There are a number of distinguishing speech characteristics that can be utilized, such as acoustic qualities, prosodic patterns, pronunciation preferences, and word usage, to name a few. The sources of these different pieces of information depend on factors ranging from the shape of the nasal passage to where the person was raised [2]. The aspiration for speaker recognition systems is to use all of the above-mentioned sources of information; simply stated, the goal is to capture every piece of information that reveals the identity of the speaker. The difficulty, …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Verification System Using Gaussian Mixture Model & UBM

In This paper presents an overview of a stateof-the-art text-independent speaker verification system. The objective of automatic speaker recognition is to extract, characterize and recognize the information about speaker identity. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization use...

متن کامل

Discriminative adaptation for speaker verification

Speaker verification is a binary classification task to determine whether a claimed speaker uttered a phrase. Current approaches to speaker verification tasks typically involve adapting a general speaker Universal Background Model (UBM), normally a Gaussian Mixture Model (GMM), to model a particular speaker. Verification is then performed by comparing the likelihoods from the speaker model to t...

متن کامل

Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM

In this paper, we investigate the Hidden Markov Model (HMM) and the temporal Gaussian Mixture Model (GMM) systems based on the Universal Background Model (UBM) concept to capture temporal information of speech for Text Dependent (TD) Speaker Verification (SV). In TD-SV, target speakers are constrained to use only predefined fixed sentence/s during both the enrollment and the test process. The t...

متن کامل

The Robustness of GMM-SVM in Real World Applied to Speaker Verification

Gaussian mixture models (GMMs) have proven extremely successful for textindependent speaker verification. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. In this work we look into the various models (GMM-UBM and GMM-SVM) and their application to speaker verification. In this paper, features vector...

متن کامل

An orthogonal GMM based speaker verification system

This paper describes a new speaker verification system based on orthogonal Gaussian mixture modeling (GMM) techniques combined with maximum a posteriori (MAP) adaptation. In most of the GMM based speaker verification systems, the variance of each component is constrained to be diagonal for its computational simplicity. However, this approximation inevitably introduces performance degradation. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005